AITopics

Country:

North America > United States > New York > Suffolk County > Stony Brook (0.04)
Asia > Nepal (0.04)

Genre: Research Report > Experimental Study (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.68)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Neural Information Processing SystemsDec-24-2025, 11:25:55 GMT

Topological Detection of Trojaned Neural Networks

Deep neural networks are known to have security issues. One particular threat is the Trojan attack. It occurs when the attackers stealthily manipulate the model's behavior through Trojaned training samples, which can later be exploited. Guided by basic neuroscientific principles, we discover subtle -- yet critical -- structural deviation characterizing Trojaned models. In our analysis we use topological tools. They allow us to model high-order dependencies in the networks, robustly compare different networks, and localize structural abnormalities. One interesting observation is that Trojaned models develop short-cuts from shallow to deep layers. Inspired by these observations, we devise a strategy for robust detection of Trojaned models. Compared to standard baselines it displays better performance on multiple benchmarks.

name change, topological detection, trojaned neural network, (3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)

Neural Information Processing SystemsOct-9-2025, 15:16:31 GMT

Rethinking the Reverse-engineering of Trojan Triggers

Deep Neural Networks are vulnerable to Trojan (or backdoor) attacks. Reverse-engineering methods can reconstruct the trigger and thus identify affected models. Existing reverse-engineering methods only consider input space constraints, e.g.,

backdoor attack, proceedings, trojan, (17 more...)

Country: Asia > Nepal (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Neural Information Processing SystemsAug-15-2025, 23:07:02 GMT

8fd7f981e10b41330b618129afcaab2d-Supplemental.pdf

artificial intelligence, machine learning, trojaned model, (16 more...)

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > New York > Suffolk County > Stony Brook (0.04)

Genre: Research Report (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsAug-15-2025, 23:06:59 GMT

8fd7f981e10b41330b618129afcaab2d-Paper.pdf

artificial intelligence, machine learning, trojaned model, (15 more...)

Country:

North America > United States > New York > Suffolk County > Stony Brook (0.04)
Asia > Nepal (0.04)

Genre: Research Report > Experimental Study (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.68)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Shabgahi, Soheil Zibakhsh, Jandali, Yaman, Koushanfar, Farinaz

MergeGuard: Efficient Thwarting of Trojan Attacks in Machine Learning Models

arXiv.org Artificial IntelligenceMay-8-2025

--This paper proposes MergeGuard, a novel methodology for mitigation of AI Trojan attacks. Trojan attacks on AI models cause inputs embedded with triggers to be misclassified to an adversary's target class, posing a significant threat to model usability trained by an untrusted third party. The core of Merge-Guard is a new post-training methodology for linearizing and merging fully connected layers which we show simultaneously improves model generalizability and performance. Our Proof of Concept evaluation on Transformer models demonstrates that MergeGuard maintains model accuracy while decreasing trojan attack success rate, outperforming commonly used (post-training) Trojan mitigation by fine-tuning methodologies. Utilizing Artificial Intelligence (AI) for automation is increasingly ingrained in various technical fields. Recent research has shown that larger Deep Neural Networks (DNNs) with greater expressive capacity can more effectively approximate complex real-world functions and achieve higher accuracy [1], [2]. As model architectures grow in size, so too do the datasets required to train these data-hungry models. To conserve resources, modern Machine Learning (ML) practitioners frequently rely on pretrained models or publicly available datasets, exposing themselves to the risk of maliciously manipulated models or tampered datasets.

artificial intelligence, deep learning, machine learning, (17 more...)

2505.04015

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsJan-16-2025, 18:50:46 GMT

Topological Detection of Trojaned Neural Networks

topological detection, trojaned model, trojaned neural network

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Hussain, Aftab, Rabin, Md Rafiqul Islam, Alipour, Mohammad Amin

On Trojan Signatures in Large Language Models of Code

arXiv.org Artificial IntelligenceMar-7-2024

Trojan signatures, as described by Fields et al. (2021), are noticeable differences in the distribution of the trojaned class parameters (weights) and the non-trojaned class parameters of the trojaned model, that can be used to detect the trojaned model. Fields et al. (2021) found trojan signatures in computer vision classification tasks with image models, such as, Resnet, WideResnet, Densenet, and VGG. In this paper, we investigate such signatures in the classifier layer parameters of large language models of source code. Our results suggest that trojan signatures could not generalize to LLMs of code. We found that trojaned code models are stubborn, even when the models were poisoned under more explicit settings (finetuned with pre-trained weights frozen). We analyzed nine trojaned models for two binary classification tasks: clone and defect detection. To the best of our knowledge, this is the first work to examine weight-based trojan signature revelation techniques for large-language models of code and furthermore to demonstrate that detecting trojans only from the weights in such models is a hard problem.

signature, trojaned model, weight density plot, (14 more...)

2402.16896

Country:

North America > United States > Texas > Harris County > Houston (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > Dominican Republic (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

arXiv.org Artificial IntelligenceFeb-12-2024

Game of Trojans: Adaptive Adversaries Against Output-based Trojaned-Model Detectors

Sahabandu, Dinuka, Xu, Xiaojun, Rajabi, Arezoo, Niu, Luyao, Ramasubramanian, Bhaskar, Li, Bo, Poovendran, Radha

We propose and analyze an adaptive adversary that can retrain a Trojaned DNN and is also aware of SOTA output-based Trojaned model detectors. We show that such an adversary can ensure (1) high accuracy on both trigger-embedded and clean samples and (2) bypass detection. Our approach is based on an observation that the high dimensionality of the DNN parameters provides sufficient degrees of freedom to simultaneously achieve these objectives. We also enable SOTA detectors to be adaptive by allowing retraining to recalibrate their parameters, thus modeling a co-evolution of parameters of a Trojaned model and detectors. We then show that this co-evolution can be modeled as an iterative game, and prove that the resulting (optimal) solution of this interactive game leads to the adversary successfully achieving the above objectives. In addition, we provide a greedy algorithm for the adversary to select a minimum number of input samples for embedding triggers. We show that for cross-entropy or log-likelihood loss functions used by the DNNs, the greedy algorithm provides provable guarantees on the needed number of trigger-embedded input samples. Extensive experiments on four diverse datasets -- MNIST, CIFAR-10, CIFAR-100, and SpeechCommand -- reveal that the adversary effectively evades four SOTA output-based Trojaned model detectors: MNTD, NeuralCleanse, STRIP, and TABOR.

adversary, detection, detector, (15 more...)

2402.08695

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Illinois (0.04)
Asia > Nepal (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (0.93)
Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

arXiv.org Artificial IntelligenceFeb-28-2023

FreeEagle: Detecting Complex Neural Trojans in Data-Free Cases

Fu, Chong, Zhang, Xuhong, Ji, Shouling, Wang, Ting, Lin, Peng, Feng, Yanghe, Yin, Jianwei

Trojan attack on deep neural networks, also known as backdoor attack, is a typical threat to artificial intelligence. A trojaned neural network behaves normally with clean inputs. However, if the input contains a particular trigger, the trojaned model will have attacker-chosen abnormal behavior. Although many backdoor detection methods exist, most of them assume that the defender has access to a set of clean validation samples or samples with the trigger, which may not hold in some crucial real-world cases, e.g., the case where the defender is the maintainer of model-sharing platforms. Thus, in this paper, we propose FreeEagle, the first data-free backdoor detection method that can effectively detect complex backdoor attacks on deep neural networks, without relying on the access to any clean samples or samples with the trigger. The evaluation results on diverse datasets and model architectures show that FreeEagle is effective against various complex backdoor attacks, even outperforming some state-of-the-art non-data-free backdoor detection methods.

artificial intelligence, backdoor, machine learning, (20 more...)

2302.145

Country:

Asia > Nepal (0.04)
North America > United States > Pennsylvania (0.04)
Asia > China (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)